.\"
.\" This file and its contents are supplied under the terms of the
.\" Common Development and Distribution License ("CDDL"), version 1.0.
.\" You may only use this file in accordance with the terms of version
.\" 1.0 of the CDDL.
.\"
.\" A full copy of the text of the CDDL should have accompanied this
.\" source.  A copy of the CDDL is also available via the Internet at
.\" http://www.illumos.org/license/CDDL.
.\"
.\"
.\" Copyright 2024 Oxide Computer Company
.\"
.Dd April 28, 2024
.Dt INTRO 9E
.Os
.Sh NAME
.Nm Intro
.Nd introduction to device driver entry points
.Sh DESCRIPTION
Section 9E of the manual describes the entry points and building blocks that are
used to build and implement all kinds of device drivers and kernel modules.
Often times, modules and device drivers are talked about interchangeably.
The operating system is built around the idea of loadable kernel modules.
Device drivers are the primary type that we think about; however, there are
loadable kernel modules for file systems, STREAMS devices, and even system
calls!
.Pp
The vast majority of this section focuses on documenting device
.Pq and STREAMS
drivers.
Device driver are further broken down into different categories depending on
what they are targeting.
For example, there are dedicated frameworks for SCSI/SAS HBA drivers, networking
drivers, USB drivers, and then general character and block device drivers.
While most of the time we think about device drivers as corresponding to a piece
of physical hardware, there are also pseudo-device drivers which are device
drivers that provide functionality, but aren't backed by any hardware.
For example,
.Xr dtrace 4D
and
.Xr lofi 4D
are both pseudo-device drivers.
.Pp
To help understand the relationship between these different types of things,
consider the following image:
.Bd -literal
  +--------------------+
  |                    |
  |  Loadable Modules  |
  |                    |
  +--------------------+
    |                          +--------------+      +------------+
    |                          |              |      |            |
    +------------------------->| Cryptography | ...  | Scheduling |  ...
    |                          |              |      |            |
    |                          +--------------+      +------------+
    |   +----------------+     +--------------+     +--------------+
    |   |                |     |              |     |              |
    +-->| Device Drivers | ... | File Systems | ... | System Calls | ...
        |                |     |              |     |              |
        +----------------+     +--------------+     +--------------+
                v
    +-----------+
    |
    |   +------------+  +---------+     +-----------+     +-----------+
    +-->| Networking |->| igb(4D) | ... | mlxcx(4D) | ... | cxgbe(4D) | ...
    |   +------------+  +---------+     +-----------+     +-----------+
    |
    |   +-------+       +----------+     +-------------+     +----------+
    +-->|  HBA  |------>| smrt(4D) | ... | mpt_sas(4D) | ... | ahci(4D) | ...
    |   +-------+       +----------+     +-------------+     +----------+
    |
    |   +-------+       +--------------+     +----------+     +---------+
    +-->|  USB  |------>| scsa2usb(4D) | ... | ccid(4D) | ... | hid(4D) | ...
    |   +-------+       +--------------+     +----------+     +---------+
    |
    |   +---------+     +------------+     +-------------+
    +-->| Sensors |---->| smntemp(4) | ... | pchtemp(4D) | ...
    |   +---------+     +------------+     +-------------+
    |
    +-------+-------------+-----------+----------+
            |             v           V          |
            v       +-----------+  +-----+       v
        +-------+   | Character |  | USB |   +-------+
        | Audio |   | and Block |  | HCD |   | Nexus |  ...
        +-------+   |  Devices  |  +-----+   +-------+
                    +-----------+
.Ed
.Pp
The above diagram attempts to explain some of the relationships that were
mentioned above at a high level.
All device drivers are loadable modules that leverage the
.Xr modldrv 9S
structure and implement similar
.Xr _init 9E
and
.Xr _fini 9E
entry points.
.Pp
Some hardware implements more than one type of thing.
The most common example here would be a NIC that implements a temperature sensor
or a current sensor.
Many devices also implement and leverage the kernel statistics framework called
.Dq kstats .
A device driver is not strictly limited to only a single class of thing.
For example, many USB client devices are networking device drivers.
In the subsequent sections we'll go into the functions and structures that are
related to creating the different device drivers and their associated
functions.
.Ss Kernel Initialization
To begin with, all loadable modules in the system are required to implement
three entry points.
If these entry points are not present, then the module cannot be installed in
the system.
These entry points are
.Xr _init 9E ,
.Xr _fini 9E ,
and
.Xr _info 9E .
.Pp
The
.Xr _init 9E
entry point will be the first thing called in the module and this is where
any global initialization should be taken care of.
Once all global state has been successfully created, the driver should call
.Xr mod_install 9F
to actually register with the system.
Conversely,
.Xr _fini 9E
is used to tear down the module.
The driver uses
.Xr mod_remove 9F
to first remove the driver from the system and then it can tear down any global
state that was added there.
.Pp
While we mention global state here, this isn't widely used in most device
drivers.
A device driver can have multiple instances instantiated, one for each instance
of a hardware device that is found and most state is tied to those instances.
We'll discuss that more in the next section.
.Pp
The
.Xr info 9E
entry point these days just calls
.Xr mod_info 9F
directly and can return it.
.Pp
All of these entry points directly or indirectly require a
.Vt "struct modlinkage" .
This structure is used by all types of loadable kernel modules and is filled in
with information that varies based on the type of module one is creating.
Here, everything that we're creating is going to use a
.Vt "struct modldrv" ,
which describes a loadable driver.
Every device driver will declare a static global variable for these and fill
them out.
They are documented in
.Xr modlinkage 9S
and
.Xr modldrv 9S
respectively.
.Pp
The following is an example of these structures borrowed from
.Xr igc 4D :
.Bd -literal
static struct modldrv igc_modldrv = {
        .drv_modops = &mod_driverops,
        .drv_linkinfo = "Intel I226/226 Ethernet Controller",
        .drv_dev_ops = &igc_dev_ops
};

static struct modlinkage igc_modlinkage = {
        .ml_rev = MODREV_1,
        .ml_linkage = { &igc_modldrv, NULL }
};
.Ed
.Pp
From this there are a few important things to take away.
A single kernel module may implement more than one type of linkage, though this
is the exception and not the norm.
The second part to call out here is that while the
.Fa drv_modops
will be the same for all drivers that use the
.Vt "struct modldrv" ,
the
.Fa drv_linkinfo
and
.Fa drv_dev_ops
will be unique to each driver.
The next section discusses the
.Vt "struct dev_ops" .
.Ss The Devices Tree and Instances
Device drivers have a unique challenge that makes them different from other
kinds of loadable modules: there may be very well more than a single instance of
the hardware that they support.
Consider a few examples: a user can plug in two distinct USB mass storage
devices or keyboards.
A system may have more than one NIC present or the hardware may expose multiple
physical ports as distinct devices.
Many systems have more than one disk device.
Conversely, if a given piece of hardware isn't present then there's no reason
for the driver for it to be loaded.
There is nothing that the Intel 1 GbE Ethernet NIC driver,
.Xr igb 4D ,
can do if there are no supported devices plugged in.
.Pp
Devices are organized into a tree that is full of parent and child
relationships.
This tree is what you see when you run
.Xr prtconf 8 .
As an example, a USB device is plugged into a port on a hub, which may be
plugged into another hub, and then is eventually plugged into a PCI device that
is the USB host controller, which itself may be under a PCI-PCI bridge, and this
chain continues all the way up to the root of the tree, which we call
.Dq rootnex .
Device drivers that can enumerate children and provide operations for them are
called
.Dq nexus
drivers.
.Pp
The system automatically fills out the device tree through a combination of
built-in mechanisms and through operations on other nexus drivers.
When a new hardware unit is discovered, a
.Vt dev_info_t
structure, the device information, is created for it and it is linked into the
tree.
Generally, the system can then use automatic information embedded in the device
to determine what driver is responsible for the piece of hardware through the
use of the
.Dq compatible
property which the systems and nexus drivers set up on their children.
For example, PCI and PCIe drivers automatically set up the compatible property
based on information discovered in PCI configuration space like the device's
vendor, device ID, and class IDs.
The same is true of USB.
.Pp
When a device driver is packaged, it contains metadata that indicates which
devices it supports.
For example, the aforementioned igb driver will have a rule that it matches
.Dq pciex8086,10a7 .
When the kernel discovers a device with this alias present, it will know that it
should assign it to the igb driver and then it will assign the
.Vt dev_info_t
structure a new instance number.
.Pp
To emphasize here, each time the device is discovered in the tree, it will have
an independent instance number and an independent
.Vt dev_info_t
that accompanies it.
Each instance has an independent life time too.
The most obvious way to think about this is with something that can be
physically removed while the system is on, like a USB device.
Just because you pull one USB keyboard doesn't mean it impacts the other one
there.
They are inherently different devices
.Po
albeit if they were plugged into the same HUB and the HUB was removed, then they
both would be removed; however, each would be acted on independently
.Pc .
.Pp
Here is a slimmed down example from a system's
.Xr prtconf 8
output:
.Bd -literal
Oxide,Gimlet (driver name: rootnex)
    scsi_vhci, instance #0 (driver name: scsi_vhci)
    pci, instance #0 (driver name: npe)
        pci1022,1480, instance #13 (driver name: amdzen_stub)
        pci1022,164f
        pci1022,1482
        pci1de,fff9, instance #0 (driver name: pcieb)
            pci1344,3100, instance #4 (driver name: nvme)
                blkdev, instance #10 (driver name: blkdev)
        pci1022,1482
        pci1022,1482
        pci1de,fff9, instance #1 (driver name: pcieb)
            pci1b96,0, instance #7 (driver name: nvme)
                blkdev, instance #0 (driver name: blkdev)
        pci1de,fff9, instance #2 (driver name: pcieb)
            pci1b96,0, instance #8 (driver name: nvme)
                blkdev, instance #4 (driver name: blkdev)
        pci1de,fff9, instance #3 (driver name: pcieb)
            pci1b96,0, instance #10 (driver name: nvme)
                blkdev, instance #1 (driver name: blkdev)
.Ed
.Pp
From this we can see that there are multiple instances of the NVMe
.Pq nvme ,
PCIe bridge
.Pq pcieb ,
and
generic block device
.Pq blkdev
driver present.
Each of these has their own
.Vt dev_info_t
and has their various entry points called in parallel.
With that, let's dig into the specifics of what the
.Vt "struct dev_ops"
actually is and the different operations to be aware.
.Ss struct dev_ops
The device operations structure,
.Vt "struct dev_ops" ,
controls all of the basic entry points that a loadable device contains.
This is something that every driver has to implement, no matter the type.
The most important things that will be present are the
.Fa devo_attach
and
.Fa devo_detach
members which are used to create and destroy instances of the driver and then a
pointer to any subsequent operations that exist, such as the
.Fa devo_cb_ops ,
which is used for character and block device drivers and the
.Fa devo_bus_ops ,
which is used for nexus drivers.
.Pp
Attach and detach are the most important entry points in this structure.
This could be practically thought of as the
.Dq main
function entry point for a device driver.
This is where any initialization of the instance will occur.
This would include many traditional things like setting up access to registers,
allocating and assigning interrupts, and interfacing with the various other
device driver frameworks such as
.Xr mac 9E .
.Pp
The actions taken here are generally device-specific, while certain classes of
devices
.Pq e.g. PCI, USB, etc.
will have overlapping concerns.
In addition, this is where the driver will take care of creating anything like a
minor node which will be used to access it by userland software if it's a
character or block device driver.
.Pp
There is generally a per-instance data structure that a driver creates.
It may do this by calling
.Xr kmem_zalloc 9F
and assigning the structure with the
.Xr ddi_set_driver_private 9F
entry point or it may use the DDI's soft state management functions rooted in
.Xr ddi_soft_state_init 9F .
A driver should try to tie as much state to the instance as possible, where
possible.
There should not be anything like a fixed size global array of possible
instances.
Someone usually finds a way to attach many more instances of some type of
hardware than you might expect!
.Pp
The
.Xr attach 9E
and
.Xr detach 9E
entry points both have a unique command argument that is used to describe a
specific action that is going on.
This action may be a normal attach or it could be related to putting the system
into the ACPI S3 sleep or similar state with the suspend and resume commands.
.Pp
The following table are the common functions that most drivers end up having to
think a little bit about:
.Vt "struct dev_ops" :
.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
.It Xr attach 9E Ta Xr detach 9E
.It Xr getinfo 9E Ta Xr quiesce 9E
.El
.Pp
Briefly, the
.Xr getinfo 9E
entry point is used to map between instances of a device driver and the minor
nodes it creates.
Drivers that participate in a framework like the SCSI HBA, Networking, or
related don't usually end up implementing this.
However, drivers that manually create minor nodes generally do.
The
.Xr quiesce 9E
entry point is used as part of the fast reboot operation.
It is basically intended to stop and/or reset the hardware and discard any
ongoing I/O.
For pseudo-device drivers or drivers which do not perform I/O, they can use the
symbol
.Ql ddi_quiesce_not_needed
in lieu of a standard implementation.
.Pp
In addition, the following additional entry points exist, but are less commonly
required either because the system generally takes care of it, such as
.Xr probe 9E .
.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
.It Xr identify 9E Ta Xr power 9E
.It Xr probe 9E Ta
.El
.Pp
For more information on the structure, see also
.Xr dev_ops 9S .
The following are a few examples of the
.Vt "struct dev_ops"
structure from a few drivers.
We recommend using the C99 style for all new instances.
.Bd -literal
static struct dev_ops ksensor_dev_ops = {
        .devo_rev = DEVO_REV,
        .devo_refcnt = 0,
        .devo_getinfo = ksensor_getinfo,
        .devo_identify = nulldev,
        .devo_probe = nulldev,
        .devo_attach = ksensor_attach,
        .devo_detach = ksensor_detach,
        .devo_reset = nodev,
        .devo_power = ddi_power,
        .devo_quiesce = ddi_quiesce_not_needed,
        .devo_cb_ops = &ksensor_cb_ops
};

static struct dev_ops igc_dev_ops = {
        .devo_rev = DEVO_REV,
        .devo_refcnt = 0,
        .devo_getinfo = NULL,
        .devo_identify = nulldev,
        .devo_probe = nulldev,
        .devo_attach = igc_attach,
        .devo_detach = igc_detach,
        .devo_reset = nodev,
        .devo_quiesce = ddi_quiesce_not_supported,
        .devo_cb_ops = &igc_cb_ops
};

static struct dev_ops pchtemp_dev_ops = {
        .devo_rev = DEVO_REV,
        .devo_refcnt = 0,
        .devo_getinfo = nodev,
        .devo_identify = nulldev,
        .devo_probe = nulldev,
        .devo_attach = pchtemp_attach,
        .devo_detach = pchtemp_detach,
        .devo_reset = nodev,
        .devo_quiesce = ddi_quiesce_not_needed
};
.Ed
.Ss Character and Block Operations
In the history of UNIX, the most common device drivers that were created were
for block and character devices.
The interfaces in block and character devices are usually in service of common
I/O patterns that the system exposes.
For example, when you call
.Xr open 2 ,
.Xr ioctl 2 ,
or
.Xr read 2
on a device, it goes through the device's corresponding entry point here.
Both block and character devices operate on the shared
.Vt "struct cb_ops"
structure, with different members being expected for both of them.
While they both require that someone implement the
.Fa cb_open
and
.Fa cb_close
members, block devices perform I/O through the
.Xr strategy 9E
entry point and support the
.Xr dump 9E
entry point for kernel crash dumps, while character devices implement the more
historically familiar
.Xr read 9E ,
.Xr write 9E,
and the
.Xr devmap 9E
entry point for supporting memory-mapping.
.Pp
While the device operations structures worked with the
.Vt dev_info_t
structure and there was one per-instance, character and block operations work
with minor nodes: named entities that exist in the file system.
UNIX has long had the idea of a major and minor number that is encoded in the
.Vt dev_t
which is embedded in the file system, which is what you see in the
.Fa st_rdev
member of stat structure when you call
.Xr stat 2 .
The major number is assigned to the driver
.Em as a whole ,
not an instance.
The minor number space is shared between all instances of a driver.
Minor node numbers are assigned by the driver when it calls
.Xr ddi_create_minor_node 9F
to create a minor node and when one of its character or block entry points are
called, it will get this minor number back and it must translate it to the
corresponding instance on its own.
.Pp
A special property of the
.Xr open 9E
entry point is that it can change the minor number a client gets during its call
to open which it will use for all subsequent calls.
This is called a
.Dq cloning
open.
Whether this is used or not depends on the type of driver that you are creating.
For example, many pseudo-device drivers like DTrace will use this so each client
has its own state.
Similarly, devices that have certain internal locking and transaction schemes
will give each caller a unique minor.
The
.Xr ccid 4D
and
.Xr nvme 4D
driver are examples of this.
However, many drivers will have just a single minor node per instance and just
say that the minor node's number is the instance number, making it very simple
to figure out the mapping.
When it's not so simple, often an AVL tree or some other structure is used to
help map this together.
.Pp
The following entry points are generally used for character devices:
.Bl -tag -width Ds
.It Xr ioctl 9E
The I/O control or ioctl entry point is used extensively throughout the system
to perform different kinds of operations.
These operations are often driver specific, though there are also some which are
also common operations that are used across multiple devices like the disk
operations described in
.Xr dkio 4I
or the ioctls that are used under the hood by
.Xr cfgadm 8
and friends.
.Pp
Whether a driver supports ioctls or not depends on it.
If it does, it is up to the driver to always perform any requisite privilege and
permission checking as well as take care in copying in and out any kind of
memory from the user process through calls like
.Xr ddi_copyin 9F
and
.Xr ddi_copyout 9F .
.Pp
The ioctl interface gives the driver writer great flexibility to create equally
useful or hard to consume interfaces.
When crafting a new committed interface over an ioctl, take care to ensure there
is an ability to version the structure or use something that has more
flexibility like a
.Vt nvlist_t .
See the
.Sq Copying Data to and from Userland
section of
.Xr Intro 9F
for more information.
.It Xr read 9E , Xr write 9E , Xr aread 9E , and Xr awrite 9E
These are the classic I/O routines of the system.
A driver's read and write routines operate on a
.Xr uio 9S
structure which describes the I/O that is occurring, the offset into the
device that the I/O should occur at, and has various flags that
describe properties of the I/O request, such as whether or not it is a
non-blocking request.
.Pp
The majority of device drivers that implement these entry points are using them
to create some kind of file-like abstraction for a device.
For example, the
.Xr ccid 4D
driver uses these interfaces for submitting commands and reading responses back
from an underlying device.
.Pp
For most use cases
.Xr read 9E
and
.Xr write 9E
are sufficient; however, the
.Xr aread 9E
and
.Xr awrite 9E
are versions that tie into the kernel's asynchronous I/O engine.
.It Xr chpoll 9E
This entry point allows a device to be polled by user code for an event of
interest and connects through the kernel to different polling mechanisms such as
.Xr poll 2 ,
.Xr port_get 3C ,
and many others.
Currently this interface only allows a driver to define the classic poll style
events such as
.Dv POLLIN ,
.Dv POLLOUT, and
.Dv POLLHUP .
The exact semantics of these are up to the driver; however, it is expected that
the read and write oriented semantics of the various events will be honored by
the device driver.
.It Xr devmap 9E and Xr segmap 9E
These are entry points that are used to set up memory mappings for a device and
replace the older
.Xr mmap 9E
entry point.
When a function calls
.Xr mmap 2
on a device, it'll reach these, starting with the
.Xr devmap 9E
entry point.
The driver is responsible for confirming that the mappings request and its
semantics are sensible, after which it will set up memory for consumption.
The
.Xr devmap 9E
manual page has more details on the specifics here and the related entry points
that can be implemented as part of the
.Xr devmap_callback_ctl 9S
structures such as
.Xr devmap_access 9E .
The segment mapping is an optional part that provides some additional controls
for a driver such as assigning certain mapping attributes or wanting to maintain
separate contexts for different mappings.
See
.Xr segmap 9E
for more information.
It is common for drivers to just provide a
.Xr devmap 9E
entry point.
.It Xr prop_op 9E
This entry point is used for drive's to manage and deal with property creation.
While this is its own entry point, most callers can just specify
.Xr ddi_prop_op 9F
for this and don't need any special handling.
.El
.Pp
The following entry points are used uniquely used for block devices:
.Bl -tag -width Ds
.It Xr strategy 9E
A driver's strategy entry point is used to actually perform I/O as described by
the
.Xr buf 9S
structure.
It is responsible for allocating all resources and then initiating the actual
request.
The actual request will finish potentially asynchronously through calls to
.Xr biodone 9F
or
.Xr bioerror 9F .
HBA or blkdev-based drivers do not usually end up implementing this interface.
.It Xr dump 9E
A driver's dump implementation is used when the operating system has had a fatal
error and is trying to persist a crash dump to disk.
This is a delicate operation as the system has already failed, which means many
normal operations like interrupt handlers, timeouts, and blocking will no longer
work.
.El
.Pp
In general, the
.Xr print 9E
entry point for block devices is vestigial and users should fill in
.Xr nodev 9F
there instead.
.Pp
The following are some examples of different character device operations
structures that drivers have employed.
Note that using C99 structure definitions is preferred:
.Bd -literal
static struct cb_ops ksensor_cb_ops = {
        .cb_open = ksensor_open,
        .cb_close = ksensor_close,
        .cb_strategy = nodev,
        .cb_print = nodev,
        .cb_dump = nodev,
        .cb_read = nodev,
        .cb_write = nodev,
        .cb_ioctl = ksensor_ioctl,
        .cb_devmap = nodev,
        .cb_mmap = nodev,
        .cb_segmap = nodev,
        .cb_chpoll = nochpoll,
        .cb_prop_op = ddi_prop_op,
        .cb_flag = D_MP,
        .cb_rev = CB_REV,
        .cb_aread = nodev,
        .cb_awrite = nodev
};

static struct cb_ops vio9p_cb_ops = {
        .cb_rev =                       CB_REV,
        .cb_flag =                      D_NEW | D_MP,
        .cb_open =                      vio9p_open,
        .cb_close =                     vio9p_close,
        .cb_read =                      vio9p_read,
        .cb_write =                     vio9p_write,
        .cb_ioctl =                     vio9p_ioctl,
        .cb_strategy =                  nodev,
        .cb_print =                     nodev,
        .cb_dump =                      nodev,
        .cb_devmap =                    nodev,
        .cb_mmap =                      nodev,
        .cb_segmap =                    nodev,
        .cb_chpoll =                    nochpoll,
        .cb_prop_op =                   ddi_prop_op,
        .cb_str =                       NULL,
        .cb_aread =                     nodev,
        .cb_awrite =                    nodev,
};

static struct cb_ops bd_cb_ops = {
        bd_open,                /* open */
        bd_close,               /* close */
        bd_strategy,            /* strategy */
        nodev,                  /* print */
        bd_dump,                /* dump */
        bd_read,                /* read */
        bd_write,               /* write */
        bd_ioctl,               /* ioctl */
        nodev,                  /* devmap */
        nodev,                  /* mmap */
        nodev,                  /* segmap */
        nochpoll,               /* poll */
        bd_prop_op,             /* cb_prop_op */
        0,                      /* streamtab  */
        D_64BIT | D_MP,         /* Driver compatibility flag */
        CB_REV,                 /* cb_rev */
        bd_aread,               /* async read */
        bd_awrite               /* async write */
};
.Ed
.Ss Networking Drivers
Networking device drivers come in many forms and flavors.
They may interface to the host via PCIe, USB, be a pseudo-device, or use
something entirely different like SPI
.Pq Serial Peripheral Interface .
The system provides a dedicated networking interface driver framework that is
documented in
.Xr mac 9E  .
This framework is sometimes also referred to as GLDv3
.Pq Generic LAN Device version 3 .
.Pp
All networking drivers will still implement a basic
.Vt "struct dev_ops"
and a minimal
.Vt "struct cb_ops" .
The
.Xr mac 9E
framework takes care of implementing all of the standard character device entry
points at the end of the day and instead provides a number of different
networking-specific entry points that take care of things like getting and
setting properties, installing and removing MAC addresses and filters, and
actually transmitting and providing callbacks for receiving packets.
.Pp
Each instance of a device driver will generally have a separate registration
with
.Xr mac 9E .
In other words, there is usually a one to one relationship between a driver
having its
.Xr attach 9E
entry point called and it registering with the
.Xr mac 9E
framework.
.Ss STREAMS Modules
STREAMS modules are a historical way to provide certain services in the kernel.
For networking device drivers, instead see the prior section and
.Xr mac 9E .
Conceptually STREAMS break things into queues, with one side being designed for
a module to read data and another side for it write or produce data.
These modules are arranged in a stack, with additional modules being pushed on
for additional processing.
For example, the TTY subsystem has a serial console as a base STREAMS module,
but it then pushes on additional modules like the pseudo-terminal emulation
.Po
.Xr ptem 4M
.Pc ,
the standard line discipline
.Po
.Xr ldterm 4M
.Pc ,
etc.
.Pp
STREAMS drivers don't use the normal character device entry points
.Pq though sometimes they do define them
or even the
.Vt "struct modldrv" .
Instead they use the
.Vt "struct modlstrmod"
which is discussed in
.Xr modlstrmod 9S ,
which in turn requires one to fill out the
.Xr fmodsw 9S ,
.Xr streamtab 9S ,
and
.Xr qinit 9S
structures.
The latter of these has two of the more common entry points:
.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
.It Xr put 9E Ta Xr srv 9E
.El
.Pp
These entry points are used when different kinds of messages are received by the
device driver on a queue.
In addition, those entry points define an alternative set of entry points for
.Xr open 9E
and
.Xr close 9E
as STREAMS modules open and close routines all operate in the context of a given
.Vt queue_t .
There are other differences here.
An ioctl is not a dedicated entry point, but rather a specific message type
.Po
.Dv M_IOCTL
.Pc
that is
received in a driver's
.Xr put 9E
routine.
.Pp
Finally, it's worth noting the
.Xr mt-streams 9F
manual page which discusses several concurrency related considerations for
STREAMS related drivers.
.Ss HBA Drivers
Host bus adapters are used to interface with the various SCSI and SAS
controllers.
Like with networking, the kernel provides a framework under the name of SCSA.
HBA drivers still often implement character device entry points; however, they
generally end up calling into shared framework entry points for
.Xr open 9E ,
.Xr ioctl 9E ,
and
.Xr close 9E .
For several of the concepts related with the 3rd version for the framework, see
.Xr iport 9 .
.Pp
The following entry points are associated with HBA drivers:
.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
.It Xr tran_abort 9E Ta Xr tran_bus_reset 9E
.It Xr tran_dmafree 9E Ta Xr tran_getcap 9E
.It Xr tran_init_pkt 9E Ta Xr tran_quiesce 9E
.It Xr tran_reset 9E Ta Xr tran_reset_notify 9E
.It Xr tran_setup_pkt 9E Ta Xr tran_start 9E
.It Xr tran_sync_pkt 9E Ta Xr tran_tgt_free 9E
.It Xr tran_tgt_init 9E Ta Xr tran_tgt_probe 9E
.El
.Pp
In addition to these, when using SCSAv3 with iports, drivers will call
.Xr scsi_hba_iport_register 9F
to create various iports.
This has the unique effect of causing the driver's top-level
.Xr attach 9E
entry point to be called again, but referring to the iport instead of the main
hardware instance.
.Ss USB Drivers
The kernel provides a framework for USB client devices to access various USB
services such as getting access to device and configuration descriptors, issuing
control, bulk, interrupt, and isochronous requests, and being notified when they
are removed from the system.
Generally a USB device driver leverages a framework of some kind, like
.Xr mac 9E
in addition to the USB pieces.
As such, there are no entry points specific to USB device drivers; however,
there are plenty of provided functions.
.Pp
To get started with a USB device driver, one will generally perform some of the
following steps:
.Bl -enum
.It
Register with the USB framework by calling
.Xr usb_client_attach 9F .
.It
Ask the kernel to fetch all of the device and class descriptors that are
appropriate with the
.Xr usb_get_dev_data 9F
function.
.It
Parse the relevant descriptors to figure out which endpoints to attach.
.It
Open up pipes to the specific USB endpoints by using
.Xr usb_lookup_ep_data 9F ,
.Xr usb_ep_xdescr_fill 9F ,
and
.Xr usb_pipe_xopen 9F .
.It
Proceed with the rest of device initialization and service.
.El
.Ss Virtio Drivers
The kernel provides an uncommitted interface for Virtio device drivers, which is
discussed in some detail in
.Pa uts/common/io/virtio/virtio.h .
A client device driver will register with the framework through and then use
that to begin feature and interrupt negotiation.
As part of that, they are given the ability to set up virtqueues which can be
used for communicating to and from the hypervisor.
.Ss Kernel Statistics
Drivers have the ability to export kstats
.Pq kernel statistics
that will appear in the
.Xr kstat 8
command.
Any kind of module in the system can create and register a kstat, it is not
strictly tied to anything like a
.Vt dev_info_t .
kstats have different types that they come in.
The most common kstat type is the
.Dv KSTAT_TYPE_NAMED
which allows for multiple, typed name-value pairs to be part of the stat.
This is what the kernel uses under the hood for many things such as the various
.Xr mac 9E
statistics that are managed on behalf of drivers.
.Pp
To create a kstat, a driver utilizes the
.Xr kstat_create 9F
function, after which it has a chance to set up the kstat and make choices about
which entry points that it will implement.
A kstat will not be made visible until the caller calls
.Xr kstat_install 9F
on it.
The two entry points that a driver may implement are:
.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
.It Xr ks_snapshot 9E Ta Xr ks_update 9E
.El
.Pp
First, let's discuss the
.Xr ks_update 9E
entry point.
A kstat may be updated in one of two ways: either by having its
.Xr ks_update 9E
function called or by having the system update information as it goes in the
kstat's data.
One would use the former when it involves doing something like going out to
hardware and reading registers, where as the latter approach might be used when
operations can be tracked as part of a normal flow, such as the number of errors
or particular requests a driver has encountered.
The
.Xr ks_snapshot 9E
entry point is not as commonly used by comparison and allows a caller to
interpose on the data marshalling process for copying out to userland.
.Ss Upgradable Firmware Modules
The UFM
.Pq Upgradable Firmware Module
system in the kernel allows a device driver to provide information about the
firmware modules that are present on a device and is generally used as
supplementary information about a device.
The UFM framework allows a driver to declare a given number of modules that
exist on a given
.Vt dev_info_t .
Each module has some number of slots with different versions.
This information is automatically exported into various consumers such as
.Xr fwflash 8 ,
the Fault Management Architecture,
and the
.Xr ufm 4D
driver's specific ioctls.
.Pp
A driver fills in the operations vector discussed in
.Xr ddi_ufm 9E
and registers it with the kernel by calling
.Xr ddi_ufm_init 9F .
These interfaces have entry points include:
.Bl -column -offset -indent "ddi_ufm_op_fill_image(9E)" "ddi_ufp_op_fill_image(9E)"
.It Xr ddi_ufm_op_getcaps 9E Ta Xr ddi_ufm_op_nimages 9E
.It Xr ddi_ufm_op_fill_image 9E Ta Xr ddi_ufm_op_fill_slot 9E
.It Xr ddi_ufm_op_readimg 9E Ta
.El
.Pp
The
.Xr ddi_ufm_op_getcaps 9E
entry point describes the capabilities of the device and what other entry points
the kernel and callers can expect to exist.
The
.Xr ddi_ufm_op_nimages 9E
entry point tells the system how many images there are and if it is not
implemented, then the system assumes there is a single slot.
The
.Xr ddi_ufm_op_fill_image 9E
and
.Xr ddi_ufm_op_fill_slot 9E
entry points are used to fill in information about slots and images
respectively, while the
.Xr ddi_ufm_op_readimg 9E
entry point is used to read an image from the device for the operating system.
That entry point is often supported when dealing with EEPROMs as many devices do
not have a way of retrieving the actual current firmware.
.Ss USB Host Interface Drivers
Opposite of USB device drivers are the device drivers that make the USB
abstractions work: USB host interface controllers.
The kernel provides a private framework for these, which is discussed in
.Xr usba_hcdi 9E .
A HCDI driver is a character device driver and ends up also instantiating a root
hub as part of its operation and forwards many of its open, close, and ioctl
routines to the corresponding usba hubdi functions.
.Pp
To get started with the framework, a driver will need to call
.Xr usba_hcdi_register 9F
with a filled out
.Xr usba_hcdi_register_args_t 9S
structure.
That registration structure includes the operation vector of callbacks that the
driver fills in, which involve opening and closing pipes
.Po
.Xr usba_hcdi_pipe_open 9E
.Pc ,
issuing the various ctrl, interrupt, bulk, and isochronous transfers
.Po
.Xr usba_hcdi_pipe_bulk_xfer 9E ,
etc.
.Pc ,
and more.
.Sh DTRACE PROBES
By default, the DTrace
.Xr fbt 4D ,
function boundary tracing,
provider will create DTrace probes based on the entry and return points
of most functions in a module
.Pq the primary exception being for some hand-written assembler .
While this is very powerful, there are often times that driver writers
want to define their own semantic probes.
The
.Xr sdt 4D ,
statically defined tracing, provider can be used for this.
.Pp
To define an SDT probe, a driver should include
.In sys/sdt.h ,
which defines several macros for probes based on the number of arguments
that are present.
Each probe takes a name, which is constrained by the rules of a C
identifier.
If two underscore characters are present in a row
.Pq Sq _
they will be transformed into a hyphen
.Pq Sq - .
That is a probe declared with a name of
.Sq hello__world
will be named
.Sq hello-world
and accessible as the DTrace probe
.Ql sdt:::hello-world .
.Pp
Each probe can present a varying number of arguments in DTrace, ranging
from 0-8.
For each DTrace probe argument, one passes both the type of the argument
and the actual value.
The following example from the
.Xr igc 4D
driver shows a DTrace probe that provides four arguments and would be
accessible using the probe
.Ql sdt:::igc-context-desc :
.Bd -literal -offset indent
DTRACE_PROBE4(igc__context__desc, igc_t *, igc, igc_tx_ring_t *,
    ring, igc_tx_state_t *, tx, struct igc_adv_tx_context_desc *,
    ctx);
.Ed
.Pp
In the above example,
.Fa igc ,
.Fa ring ,
.Fa tx ,
and
.Fa ctx
are local variables and function parameters.
.Pp
By default SDT probes are considered
.Sy Volatile ,
in other words they can change at any time and disappear.
This is used to encourage widespread use of SDT probes for what may be
useful for a particular problem or issue that is being investigated.
SDT probes that are stabilized are transformed into their own first
class provider.
.Sh SEE ALSO
.Xr Intro 9 ,
.Xr Intro 9F ,
.Xr Intro 9S
